| sample_count | Total collected measurement points in this run. | Higher sample counts increase confidence in regression and trend decisions. |
| warmup_samples | Initial samples excluded from steady-state analysis. | Prevents startup transients from distorting production-latency conclusions. |
| duration_seconds | Total measured run time in seconds. | Validates benchmark window length and comparability between runs. |
| latency.mean_ms | Average latency across measured samples. | Useful trend metric, but can hide tail outliers. |
| latency.p50_ms | Median response latency. | Represents typical user experience and baseline responsiveness. |
| latency.p95_ms | 95th percentile latency. | Captures tail latency for near-worst-case requests. |
| latency.p99_ms | 99th percentile latency. | Primary straggler signal; spikes indicate queueing, throttling, or contention. |
| throughput.mean_fps | Average processed samples per second. | Primary capacity metric for planning and SLO sizing. |
| throughput.min_fps | Lowest observed throughput in steady state. | Highlights instability or periodic slowdowns under load. |
| cpu.mean_percent | Average CPU utilization. | Identifies host bottlenecks, preprocessing pressure, or dataloader saturation. |
| gpu.mean_percent | Average GPU utilization. | Low utilization with high latency usually indicates input or sync bottlenecks. |
| memory.mean_mb | Average memory footprint in MB. | Tracks memory pressure and headroom for larger batch sizes/models. |
| power.mean_w | Average power draw in watts. | Used for energy-cost modeling and efficiency tracking. |
| power.max_w | Peak power draw in watts. | Helps detect power spikes and validate power envelope constraints. |
| temperature.max_c | Maximum observed device temperature. | High peaks correlate with thermal throttling risk and performance collapse. |